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Microarray analysis to monitor expression activities in thousands 
of genes simultaneously has become routine in biomedical research 
during the past decade. A tremendous amount of expression profiles 
are generated and stored in the public domain and information inte¬ 
gration by meta-analysis to detect differentially expressed (DE) genes 
has become popular to obtain increased statistical power and vali¬ 
dated findings. Methods that aggregate transformed p-value evidence 
have been widely used in genomic settings, among which Fisher’s and 
Stouffer’s methods are the most popular ones. In practice, raw data 
and p-values of DE evidence are often not available in genomic studies 
that are to be combined. Instead, only the detected DE gene lists un¬ 
der a certain p-value threshold (e.g., DE genes with p-value < 0.001) 
are reported in journal publications. The truncated p-value infor¬ 
mation makes the aforementioned meta-analysis methods inapplica¬ 
ble and researchers are forced to apply a less efficient vote counting 
method or naively drop the studies with incomplete information. The 
purpose of this paper is to develop effective meta-analysis methods 
for such situations with partially censored p-values. We developed 
and compared three imputation methods—mean imputation, single 
random imputation and multiple imputation—for a general class of 
evidence aggregation methods of which Fisher’s and Stouffer’s meth¬ 
ods are special examples. The null distribution of each method was 
analytically derived and subsequent inference and genomic analysis 
frameworks were established. Simulations were performed to investi¬ 
gate the type I error, power and the control of false discovery rate 
(FDR) for (correlated) gene expression data. The proposed methods 
were applied to several genomic applications in colorectal cancer, pain 
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and liquid association analysis of major depressive disorder (MDD). 

The results showed that imputation methods outperformed existing 
naive approaches. Mean imputation and multiple imputation meth¬ 
ods performed the best and are recommended for future applications. 

1. Introduction and motivation. Microarray analysis to monitor expres¬ 
sion activities in thousands of genes simultaneously has become routine 
in biomedical research during the past decade. The rapid development in 
biological high-throughput technology results in a tremendous amount of 
experimental data and many data sets are available from public domains 
such as Gene Expression Omnibus (GEO) and ArrayExpress. Since most 
microarray studies have relatively small sample sizes and limited statisti¬ 
cal power, integrating information from multiple transcriptomic studies us¬ 
ing meta-analysis techniques is becoming popular. Microarray meta-analysis 
usually refers to combining multiple transcriptomic studies for detecting 
differentially expressed (DE) genes (or candidate markers). DE gene anal¬ 
ysis identifies genes differentially expressed across two or more conditions 
(e.g., cases and controls) with statistical significance and/or biological sig¬ 
nificance (e.g., fold change). Microarray meta-analysis in many situations 
refers to performing traditional meta-analysis techniques on each gene re¬ 
peatedly and then controlling the false discovery rate (FDR) to adjust p- 
values for multiple comparison [Borovecki et al. (2005); Cardoso et al. (2007); 
Pirooznia, Nagarajan and Deng (2007); Segal et al. (2004)]. Fisher’s method 
[Fisher (1925)] was the first meta-analysis technique introduced in microar¬ 
ray data analysis in 2002 [Rhodes et al. (2002)], followed by Tippett’s min¬ 
imum p-value method in 2003 [Moreau et al. (2003)]. Subsequently, many 
meta-analysis approaches have been used in this field, including extensions 
of existing meta-analysis techniques and novel methods to encompass the 
challenges presented in the genomic setting [Choi et al. (2003), Choi et al. 
(2007), Moerau et al. (2003), Owen (2009), Li and Tseng (2011), and see a 
review paper by Tseng, Ghosh and Feingold (2012)]. 

To combine findings from multiple research studies, one needs to know 
either the effect size or the p-value for each study. Since the differences 
in data structures and statistical hypotheses across multiple studies may 
make the direct combination of effect sizes impossible or the result suspi¬ 
cious, combining p-values from multiple studies is often more appealing. 
Popular p-value combination methods [see review and comparative papers 
Tseng, Ghosh and Feingold (2012) and Chang et al. (2013)] can be split into 
two major categories of evidence aggregation methods (including Fisher’s, 
Stouffer’s and logit methods) and order statistic methods [such as min¬ 
imum p-value, maximum p-value and rth ordered p-value by Song and 
Tseng (2014)]. Evidence aggregation methods utilize summation of certain 
transformations of p-values as their test statistics to aggregate differential 
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expression evidence across studies. Among evidence aggregation methods, 
Fisher’s method is the most well known, in which the test statistic is de¬ 
fined as = —2Y^^=i^og{pk) 1 where K is the number of independent 

studies that are to be combined and pk is the p-value of individual study 
k,l < k < K. Under the null hypothesis of no effect size in all studies and 
assuming that studies are independent and models for assessing p-values 
are correctly specified, follows a chi-square distribution with de¬ 

grees of freedom 2K. Fisher’s method has been popular due to its simplic¬ 
ity and some theoretical properties, including admissibility under Gaussian 
assumption [Birnbaum (1954, 1955)] and asymptotically Bahadur optimal¬ 
ity (ABO) under equal nonzero effect sizes across studies [Littel and Folk 
(1971, 1973)]. Some variations of Fisher’s methods were proposed by us¬ 
ing unequal weights or a trimmed version of Fisher’s test statistic [Olkin 
and Saner (2001)]. Another widely used evidence aggregation method is the 
Stouffer’s method [Stouffer (1949)], in which the test statistic is defined as 
J^stouffer ^ where 4>-i(-) is the inverse cumulative distribu¬ 

tion function (CDF) of standard normal distribution. 

In order to combine p-values, all p-values across studies should be known. 
In genomic applications, however, raw data and thus p-values are often not 
available and usually only a list of statistically significant DE genes (p-value 
less than a threshold) is provided in the publication [Griffith, Jones and 
Wiseman (2006)]. Although many journals and funding agencies have en¬ 
couraged or enforced data sharing policies, the situation has only improved 
moderately. Many researchers are still concerned about data ownership, and 
researchers whose studies are sponsored by private funding are not obligated 
to share data in the public domain. For example, in Chan et al. (2008), 
publications of 23 colorectal cancer versus normal gene expression profiling 
studies were collected to perform meta-analysis to identify consistently re¬ 
ported candidate disease-associated genes. However, only one raw data set 
is available from the Gene Expression Omnibus (GEO; http://www.ncbi. 
nlm.nih.gov/geo/, GSE3294) and most other papers only provided a list of 
DE genes (and their p-values) under a prespecified p-value threshold. A sec¬ 
ond motivating example comes from a microarray meta-analysis study for 
pain research [LaCroix-Fralish et al. (2011)], in which 19 microarray studies 
of pain models were collected to detect the gene signature and patterns of 
pain conditions. Among the 19 studies, only one raw data set was available 
on the author’s website and all the other papers reported the DE gene lists 
under different thresholds. 

In these two motivating examples (details to be shown in Sections 4.1 and 
4.2), the incomplete data forced researchers to either drop studies with in¬ 
complete p-values or apply the convenient vote counting method [Hedges and 
Olkin (1980)]. Dropping studies with incomplete information greatly reduces 
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the statistical power and is obviously not applicable in the two motivating 
examples since the complete data was available in only one study. The con¬ 
ventional vote counting procedure is well known as flawed and low-powered 
[McCarley et al. (2001)]. loannidis et al. (2009) attempted to reproduce 18 
microarray studies published in Nature Genetics during 2005-2006. Inter¬ 
estingly, only two were “in principle” replicated, six “partially” replicated 
and ten could not be reproduced. This result illustrates well the widespread 
difficulty of obtaining raw data or reproducing published results in the field. 
Therefore, developing methods to efficiently combine studies with truncated 
p-value information is an important problem in microarray meta-analysis. 

In this paper, we assume that K = Ki + K 2 studies are combined. In Ki 
studies, the raw gene expression data matrix and sample annotations are 
available and the complete p-values pgi (1 < 5 < G for genes and 1 < i < Ki) 
can be reproduced for meta-analysis. For the remaining K 2 studies, either 
the raw data or annotation is not available. Only incomplete information of 
a DE gene list (under p-value threshold a* for study i) is provided in the 
journal publication. In this situation, the available information is an indica¬ 
tor function to represent whether the p-value of gene g in study i 

is smaller than ccj or not. We outline the paper structure as the following. 
In Section 2 a general class of evidence aggregation meta-analysis methods 
under a univariate scenario was investigated for the mean imputation, the 
single random imputation and the multiple imputation methods, respec¬ 
tively, in which the exact or approximate null distributions were derived 
under the null hypotheses and the results are shown for the Fisher and the 
Stouffer methods. In Section 3.1 simulations of the expression profile were 
performed to compare performance of different methods. Simulations were 
further performed in Section 3.2 using 8 major depressive disorder (MDD) 
and 7 prostate cancer studies where raw data were completely available and 
the true best performance (complete case) could be obtained. In Section 4 
the proposed methods were applied to the two motivating examples. In Sec¬ 
tion 4.1 the methods were applied to 7 colorectal cancer studies, where the 
raw data were available only in 3 studies. In Section 4.2 the proposed meth¬ 
ods were applied to 11 microarray studies of pain conditions, where no raw 
data were available. In Section 4.3 we developed an unconventional appli¬ 
cation of the proposed methods to facilitate the large computational and 
data storage needs in a liquid association meta-analysis. Discussions and 
conclusions are included in Section 5 and all proofs are left in the Appendix. 

2. Methods and inferences. 

2.1. Evidence aggregation meta-analysis methods. Here we consider a 
general class of univariate evidence aggregation meta-analysis methods (for 
gene g fixed), in which the test statistics are defined as the sum of selected 
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transformations of p-values for each individual study. Without loss of gen¬ 
erality, assuming that Fx{-) is the cumulative distribution function (CDF) 
of a continuous random variable X, the test statistic T is defined as 

K K 

( 2 . 1 ) T = J2Tk:=Y.F^\pk), 

i=l k=l 

where pk is the p-value from the fcth study. Theoretically, X can be any 
continuous random variable. However, in practice, X is usually selected such 
that the test statistic T follows a simple distribution. For instance, when 
X ~ Yn, it holds T ~ Y^^' (Fisher’s method) and T ~ N(0, K) holds, provided 
X ~ N(0,1) (Souffer’s method). 

The hypothesis that corresponds to testing the homogeneous effect sizes 
of K studies by evidence aggregation methods is a union-intersection test 
(UIT) [Roy (1953)]: 

K K 

(2.2) Ho:f]{ek = 0} versus HA-.[J{9k ^0}- 

k=l k=l 

In this paper, we focus on two popular special cases: 

1. Fisher’s method [Fisher (1925)]: When X ~ xh = F^^{pk) = 

-21og(pfc). 

2. Stouffer’s method [Stouffer (1949)]: When X ~ N(0,1), = F^^{pk) = 

^~^{Pk)- 

Another example is the logit method [Hedges and Olkin (1985)], where 
Tfc = — log( But since this method is rarely used in practice, we will not 

examine it further in this paper. To apply the evidence aggregation meta¬ 
analysis methods mentioned above, all the p-values should be observed. How¬ 
ever, in genomic applications, it often happens that p-values of some studies 
are truncated and only their ranges are reported. Two naive methods are 
commonly used to overcome this situation: the vote counting method or the 
available-case method which only combines studies with observed p-values. 
The available-case method discards rich information contained in the stud¬ 
ies with truncated p-values and, therefore, the statistical power is reduced. 
Hedges and Olkin (1980) showed that the power of vote counting converges 
to 0 when many studies of moderate effect sizes are combined and, there¬ 
fore, the vote counting method should be avoided whenever possible. In this 
section, three imputation methods—mean imputation, single random impu¬ 
tation and multiple imputation method—are proposed and investigated to 
combine studies with truncated p-values and the corresponding null distri¬ 
butions are derived analytically, respectively. We first define some notation. 

Assume that K independent studies are to be combined and pi,... ^pk are 
the corresponding p-values. Without loss of generality, assume that all the 
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p-values are available in the first Ki studies and only the indicator function 
of DE evidence is reported in the other K 2 studies. 

Define a pair {ci,Xi),i = 1,... ,K for each study, in which Cj is the “cen¬ 
soring” indicator satisfying 


J 0, if Pi is observed (i.e., 1 <i < Ki), 

\ 1, if Pi is censored (i.e.jETi -|- 1 < i < K), 


and Xi is the final observed values which is defined as 


(2.4) 


Jpi, ifci = 0, 

if Ci — Ij 


where a* is the p-value threshold for study i {Ki + 1 <i < Ki + K 2 = K). 
For each i = 1,2,..., FT, one can impute the missing value by pp 


Pi Pi ■ l{ci=0} 4“ [ft ' ^{xi=l} 4“ D ■ l{a;i=0}] ‘ ^{ci=l} 

with Qi € (0,aj), and r* € [a*, 1). Sections 2.2-2.4 develop three imputation 
methods for selection of qi and r*. 


2.2. Mean imputation method. The simplest imputation method is the 
mean imputation method, in which Qi = ^ and r* = Then the test 

statistic T for truncated data satisfies 

K K Ki K 2 

^ =E =E ^x\p^)= E (p*) +E ^x (M+i) 

i=l j=l 


(2.5) 

2 = 1 2=1 

K 2 


II 

M 

J” 



with 



( 2 . 6 ) 


Ki 


A = J2^x^iPi) and 


2 = 1 


Bj — (pKi+j) 


= F- 


-if 0!Ki+j 


■ ^{PKi+j<aKi+j} 






for j = 1,, K 2 . Recall that under the null hypothesis, the random variable 
A satisfies A ~ X 2 K 1 f^^ Fisher method and A ~ N(0, Ki) for the Stouffer 
method. Obviously, Bj follows a Bernoulli distribution. 

The results can be summarized into the following theorem (proof left to 
Appendix B.l): 
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Theorem 1. For j = 1,2,..., K 2 and given t, by defining 

1 / I + aKi+j 




(2.7) 

it holds 

F{f < t) 


-F 


X 


and 


K 2 


i=i 


-1 A + “Ai+J 


( 2 . 8 ) 


K 2 


K 2 


Y. ^^FA[t-c- ^jibi , 


(il,...jK2)6{0,l}^2i=l 


2=1 


where Fa{-) is the CDF of A. Given the CDF, the expected values of test 
statistic T under null distributions can be calculated as follows: 


1. For Fisher’s method, it holds 


K 2 


E(r) = 2iCi -2^ 
i=i 


aAi-i-jlog 


OiK^+j 


+ {I - aK^+j)\og 


1 -|- CXKi+j 


while the expectation of the original T is E(T) = 2Ki + 2 K 2 = 2K. 
2. For Stouffer’s method, it holds 


K 2 


E(r) = ^ 

j = l L 


aKi+j^~ 


-1 / OlKr+j 


+ (1 - OlKi+j)^ 


1 / I + OLKi+j 


while the expectation of the original T is E(T) = 0. 

Note that there are 2^^ terms summation in the right-hand side of equa¬ 
tion (2.8), which may cause severe computing problem when K 2 is large. 
However, when some a, are equal, the formula can be simplified. With¬ 
out loss of generality, assume there are r > 1 different p-value thresholds 
{/3i,..., fir} such that 

K 2 K 2 

• • • ! ^ '^{aK^+j=0r} ~ 

J=1 i=l 

r 

'^ni = K2, 

1=1 


(2.9) 
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then by defining f{j]ni,/3i) := ^ for j = 0,..., and 

I = 1,..., r, the formula can be simplified as 


p(r < t) 


ni Tir r 

(2.10)= ^ X] 

ii=o jj.=oi=i 





Therefore, the summation is reduced from 2^^ terms to n[=i(^i + 1) terms. 

From the above theorem one concludes that T is a biased estimator of the 
original T. This motivates the following two stochastic imputation methods. 


2.3. Single random imputation method. It is well known that the mean 
imputation method will underestimate the variance of {pki +j}j=i [Little 
and Rubin (2002)]. Furthermore, Theorem 1 indicates that the test statistic 
T from the mean imputation method is a biased estimator of the original T. 
To avoid this problem, one can replace the mean by randomly simulating qi 
and Vi from Uniform(0,Oi) and Uniform(Q:j, 1), respectively. 

Recall that for j = 1,K 2 , Bj = F^^{pKi+j). The next theorem (proof 
left to Appendix B.2) states that Bj ~ X holds under the null hypothesis, 
that is, Bj and X follow the same distribution. 

Theorem 2. For j = 1,2, ..., K 2 , it holds 
( 2 . 11 ) Bjr^X. 

The following corollary is a simple consequence of the above theorem. 


Corollary. For the single random imputation method, the following 
facts hold for T: 

1. For Fisher’s method, it holds Bj ~ xi therefore 

2. For Stoujfer method, it holds Bj ~ N(0,1) and therefore r~ N(0, K). 

Therefore, in this case, T is an unbiased estimator of T defined in equa¬ 
tion (2.1). 

2.4. Multiple imputation method. Although the single random imputa¬ 
tion method allows the use of standard complete-data meta-analysis meth¬ 
ods, it cannot reflect the sampling variability from one random sample. The 
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multiple imputation method (MI) overcomes this disadvantage [Little and 
Rubin (2002)]. In MI, each missing value is imputed D times. Therefore, 
is a sequence of test statistics which are defined as 

K K2 

(2.12) f^ = Y,FxHp'i) = A + '£B] foTl = l,...,D 

i=l j=l 

with 

(2.13) ~ Uniform(0, a*) and ~ Uniform(aj, 1). 

The test statistic is dehned as T = -^ which satisfies 


_ 

t = a + Y, 


V D 

j=i L \ 1=1 


1 ^ \ 


+j<°‘Ki+j} 


D 


(^Ai+jr) j ■ 


1=1 


K2 




, D 
j=l L \ l=l 


D ' '^{PKi+]<°‘Ki+j} ^ { D ^ ^ 1 


i+i>axi+j} 


1=1 


K2 K2 

A + '^[Wj ■ + Vj • (1 - l{pKi+i<axi-Hj})] = ^ + 52 

i=i i=i 


Since Zj = Wj with probability cxKi+j and Zj = Vj with probability 1 — 
aKi+j, Zj is a mixture distribution of Wj and Vj and, therefore, T — A is 
a mixture distribution of {Wj, Rj, j = 1 ,..., K^}- 

Note that ITj and Vj are independent and identically distributed (i.i.d.) 
for fixed j. Denoting by {PVjj(Pv.) the mean and variance of ITj 

and Vj, respectively, then by the central limit theorem one concludes that 
for large enough D > 0 it holds 

T=(;tE'L')~N(w,.^) and 

Then the following theorem holds. 
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Theoj^m 3. ForJ^ji ,..., 3 x 2 )^ {0,1}^^; defining C/(ji,..., jK 2 ) = 
+ (1 - which satisfies 

U{jl,...,jK2) 

(2.14) 

-K 2 ^ K 2 

~ N + (1 - ji)hVj), ^ + (1 - ji)<^Vj) 

_2=1 i=l 

then for sufficiently large D, it holds approximately that 


F{T<t) 

(2.15) 

K 2 

(jl,.-JX2)6{0,l}^2 »=1 

The detailed notation is left to Appendix C. 


Similar to the mean imputation method, the formula can be simplified 
when some p-value thresholds are equal, that is, 

ni Ur r 

(2.16) p(r<t) = E En f{3i;ni, + U{ji,... ,jr) < t), 

jl=0 jr = 0l = l 

with U{ji,...,jr) = Td=iUiP'x^iQi) + (ni - iO^x^(n)), ® ~ Uniform(0, A) 
and n ~ Uniform(/3;, 1). 

3. Simulation results. 


3.1. Simulated expression profiles. To evaluate performance of the pro¬ 
posed imputation methods in the genomic setting, we simulated expression 
profiles with correlated gene structure and variable effect sizes as follows: 

Simulate gene correlation structure for G = 10,000 genes, N = 100 sam¬ 
ples in each study and K = 10 studies. In each study, 4000 of the 10,000 
genes belong to C = 200 independent clusters. 

Step 1. Randomly sample gene cluster labels of 10,000 genes {Cg G {0,1, 
2,... ,C} and 1 < g < G), such that C = 200 clusters each containing 20 
genes are generated = c) = 20, VI < c < C = 200] and the remaining 

6000 genes are unclustered genes l(C'g = 0) = 6000]. 

Step 2. For any cluster c(l < c < C) in study k{l <k< K), sample ~ 
VF“^('I', 60), where ^ = 0.5/20x20 + 0.5J20X2O) denotes the inverse 

Wishart distribution, I is the identity matrix and J is the matrix with 
all the entries being 1. Set vector cjcfc as the square roots of the diagonal 
elements in S),^. Calculate Scfc such that (TckFckk^k ~ '^'gk- 
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Step 3. Denote , g^fj as the indices for genes in cluster c. In other 

words, C (c) = c, where 1 < c < 200 and 1 < j < 20. Sample the expression 

^3 

of clustered genes by {X'u) ^ - ~ MVN{0,T;ck), where 1 < n < 

g\ >nk g 2 Q nk 

N = 100 and 1 < A: < = 10. Sample the expression for unclustered genes 

^'gnk ~ N(0,1) for 1 < n < iV and 1 < A: < K if Cg = 0. 

Simulate differential expression pattern. 

Step 4. Sample effect sizes g,gk from Unif(0.1,0.5) for 1 < 5 < 1000 as DE 
genes and set figk = 0 for 1001 < g <G as non-DE genes. 

Step 5. Eor the first 50 control samples, Xgnk = ^9 <n< 

N/2 = 50,1< k<K). Eor cases, Yg^k = -’^g(„+5o)fe + l^gk (1 < 5 < G, 1 < n < 
fV/2 = 50,l<A:<E:). 

In the simulated data sets, K = 10 studies with G = 10,000 genes simulated. 
Within each study, there were y = 50 cases and 50 controls. The first 1000 
genes were DE in all 10 studies with effect sizes randomly simulated from 
a uniform distribution on (0.1,0.5), respectively, and the remaining 9000 
were non-DE genes. We chose this effect size range to produce an averaged 
standardized effect size at = 0.1414 so that the DE analysis generates 
~500-600 candidate DE genes (Table 1), a commonly seen range in real 
applications. In each study, 200 gene clusters existed, each containing 20 
genes. The correlation structure within each cluster was simulated from an 
inverse Wishart distribution. 

In the simulations, we performed a two sample A-test for each gene in 
each study and then combined the p-values using the imputation methods 
proposed in this paper. Eor simplicity, we viewed the p-values from the last 5 
studies as truncated with thresholds {ai,... ,af) = ( 0 . 001 , 0 . 001 , 0 . 01 , 0 . 01 , 
0.05), respectively. In most genomic meta-analysis, researchers often use 
conventional permutation analysis by permuting sample labels to compute 
the p-values to preserve gene correlation structure. However, such a non- 
parametric approach is not applicable in our situation, since raw data are 
not available in some studies. In order to control the false discovery rate 
(EDR), we examined the Benjamini-Hochberg (B~H) method [Benjamini 
and Hochberg (1995)] and the Benjamini-Yekutieli (B-Y) method [Ben¬ 
jamini and Yekutieli (2001)] separately. The number of DE genes detected 
at nominal EDR rate 5% were recorded and the true EDR rates were com¬ 
puted for each meta-analysis method by 

^gl(gene g detected with g > 1001 ) 

EDR =- — ----. 

#|genes detected) 

In the multiple imputation method, D = 50 was selected. Simulations 
were repeated for 50 times and the mean and standard errors of numbers 
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Table 1 

Simulation results for correlated data matrix at nominal FDR = 5% 


Fisher Stouffer 

Method/Mean (s.e.) No. DE True FDR No. DE True FDR 


B-H 


Complete cases 

632.9 

(32.5) 

Available-case 

263.5 

(37.4) 

Mean imputation 

508.6 

(35.1) 

Single imputation 

408.9 

(35.7) 

Multiple imputation 

509.2 

(35.0) 


0.043 (0.0013) 
0.048 (0.0076) 
0.046 (0.0016) 
0.043 (0.0018) 
0.045 (0.0015) 


518.6 

(36.2) 

0.046 

(0.0015) 

216.8 

(35.3) 

0.064 

(0.022) 

449.8 

(36.2) 

0.047 

(0.0022) 

293.9 

(32.6) 

0.045 

(0.0027) 

463.8 

(35.7) 

0.050 

(0.0019) 


Complete cases 

354.0 

(34.4) 

Available-case 

102.4 

(21.9) 

Mean Imputation 

234.5 

(32.1) 

Single Imputation 

164.0 

(27.3) 

Multiple imputation 

235.3 

(32.0) 


0.0041 (0.00083) 
0.0047 (0.0012) 
0.0037 (0.00074) 
0.0057 (0.0014) 
0.0037 (0.00075) 


261.7 

(33.9) 

0.0036 

(0.00097) 

82.8 

(20.6) 

0.0029 

(0.00096) 

203.8 

(30.8) 

0.0034 

(0.00073) 

113.5 

(22.3) 

0.0039 

(0.0015) 

216.1 

(30.9) 

0.0050 

(0.0010) 


of DE genes controlled by B-H and B-Y methods and their true FDR are 
reported in Table 1. The results showed that the FDRs were controlled 
well for B-H correction but rather conservative for B-Y correction (the 
true FDR of B-Y is only 1/10 of B-H at nominal FDR = 5%). This is 
consistent with the previous observation that the B-Y adjustment tends to 
be over-conservative since it guards against any type of correlation structure 
[Benjamini and Yekutieli (2001)]. As a result, the B-H correction will be used 
for all applications hereafter. The simulation results showed consistently 
that imputation methods had higher statistical power than the available- 
case method, and the mean imputation and multiple imputation methods 
outperform the single random imputation method with similar performance. 
Surprisingly, the ratio of detected DE genes compared to the complete case 
increased from 41.6% in the available case (263.5/632.9) to 80.4% in mean 
imputation (508.6/632.9) using Fisher’s method. The improvement is even 
more signihcant using Stouffer’s method (from 41.8% to 86.7%), while at the 
same time the true FDRs were controlled at a similar level for all methods. 
The result shows that imputation methods successfully utilize the incomplete 
p-value information to greatly recover the detection power. 

We further examined the situation when gene dependence structure does 
not exist [i.e., steps 1-3 were skipped and ~ N(0,1)]. Table 2 shows the 
true type I error control under nominal signihcance level 5% (i.e., true type 

9 is detected at significance level 0 . 05 ). 

1 error = —^-). The result shows 

adequate type I error control and confirms the validity of the closed form or 
approximated formula of different imputation methods in Section 2. 

To investigate the impact of D on the performance of the multiple impu¬ 
tation method, simulations were performed for D G {20,30,50,100,150,200, 
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Table 2 

Type I error control for independent data matrix at nominal significance level 5% 



Fisher 

Stouffer 

Complete cases 

0.050 (0.00031) 

0.050 (0.00037) 

Available-case 

0.050 (0.00035) 

0.050 (0.00033) 

Mean imputation 

0.050 (0.00031) 

0.050 (0.00033) 

Single imputation 

0.050 (0.00032) 

0.051 (0.00032) 

Multiple imputation 

0.050 (0.00031) 

0.051 (0.00031) 


250,300,500}. The result is shown in Appendix A, Figure 3, which demon¬ 
strates that the performance of the multiple imputation method is quite 
robust for different number of imputation D. We use D = 50 throughout 
this paper. 

3.2. Simulation from complete real data sets. In this subsection the pro¬ 
posed methods were applied to two real microarray data sets, including 7 
prostate cancer studies [Gorlov et al. (2009)] and 8 major depressive disorder 
(MDD) studies [Wang et al. (2012)]. The details are summarized in Supple¬ 
ment Table 1 [Tang et al. (2014)]. For each data set, about half of the studies 
(four for MDD and three for prostate cancer) were randomly selected with 
p-value truncation threshold 0.05. Five methods including complete data, 
available-case, single random imputation, mean imputation and multiple 
imputation methods were applied to the data sets with the simulated in¬ 
complete data to impute by Stouffer’s and Fisher’s methods, respectively. 
The generated p-values were corrected by the B-H method and the simula¬ 
tion was repeated for 50 times. Figure 1 shows boxplots of the numbers of 
differentially expressed (DE) genes at FDR = 1% for different methods in 
MDD and FDR = 0.5% for prostate cancer data. Figure 1 indicates similar 
conclusions that the multiple imputation and the mean imputation methods 
detect more DE genes than the available-case method and single random im¬ 
putation method. In the MDD example, very few DE genes (average of 16 
and 83 for Fisher and Stouffer, resp.) were detected using the available-case 
method if half of the studies have truncated p-values. The mean and mul¬ 
tiple imputation methods greatly improved the detection sensitivity. About 
95.2% (Fisher) and 96.3% (Stouffer) of DE genes detected by the mean 
imputation method overlapped with DE genes detected by complete data 
analysis in MDD and about 94.7% (Fisher) and 88.1% (Stouffer) of DE 
genes detected by the mean imputation method overlapped with DE genes 
detected by complete data analysis in prostate cancer, showing the ability 
of imputation methods to recover DE gene detection power. 
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Fig. 1. Number of DE genes detected by Fisher’s or Stauffer’s method. C: complete data; 
A: available-case; Me: mean-imputation; S: single-imputation; Mu: multiple imputation. 


4. Applications. 

4.1. Application to colorectal cancer. In the first motivating example, 
we followed Chan et al. (2008) and attempted to collect 23 colorectal cancer 
versus normal gene expression profiling studies. Raw data were available in 
only one study [Bianchini et al. (2006)] and 4 of the other 22 studies contain¬ 
ing more than 100 DE genes at different p-value thresholds were included in 
our analysis. We searched the GEO database and identified two additional 
new studies [Jiang et al. (2008) and Bellot et al. (2012)]. The seven studies 
under analysis were summarized in Table 3. After gene-matching, 6361 genes 
overlapped in all three studies with raw data. The available-case method, 
the mean imputation method, the single random imputation method and 
the multiple imputation method were applied for the seven studies for the 
Eisher and Stouffer methods, respectively, and the results were reported in 
Table 4. For the single random imputation method and the multiple im¬ 
putation method, the analyses were repeated 50 times and the mean and 
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Table 3 

Seven colorectal cancer versus normal tissue expression profiling studies included in 

analysis 


Study 

No. of 
samples 

No. of 

genes 

Raw data 
availability 

No. of DE 

genes 

No. of overlapped 
DE genes 

p-value 
threshold 

Biancliini_2006 

24 

7403 

GSE3294 

- 

- 

- 

Bellot_2012 

17 

18,191 

GSE24993 

- 

- 

- 

Jiang_2008 

48 

18,197 

GSE10950 

- 

- 

- 

Grade_2007 

103 

21,543 

„ 

1950 

635 

le-7 

Croner_2005 

33 

22,283 

- 

130 

47 

0.006 

Kim 2004 

32 

18,861 

- 

448 

143 

0.001 

Bertucci_2004 

50 

8074 

- 

245 

97 

0.009 


standard error of the number of DE genes detected were reported under 
FDR control by the B-H method. The results demonstrate that for various 
FDR thresholds, the mean imputation method and the multiple imputation 
method detected more DE genes than the available-case method and the 
single random imputation method, which was consistent with previous find¬ 
ings in simulations. Under FDR = 0.01% control, the Fisher and Stouffer 
mean imputation detected 2.07 (1183/571) and 10.35 (383/37) times of DE 
genes than those by the available-case method, respectively. 

4.2. Application to pain research. The second motivating example comes 
from the meta-analysis of 20 microarray studies of pain to detect the patterns 
of pain [LaCroix-Fralish et al. (2011)]. The original meta-analysis utilized 
DE gene lists from each study under different threshold criteria from p-value, 
FDR or fold change and identihed 79 “statistically signihcant” genes that 
appeared in the DE gene lists of four or more studies. The vote counting 
method essentially lost a tremendous amount of information with flawed 
statistical inference. When we attempted to repeat the meta-analysis, raw 
data of only one of the 20 studies (Barr_2005) could be found. The old plat¬ 
form used in that study, however, contained only 792 genes and had to be 


Table 4 

Summary of results for colorectal cancer 




Fisher 


Stouffer 


FDR 

Available Mean Single Multiple 

Available Mean Single 

Multiple 

1% 

2587 

2855 2172.4 (2.90) 2785.4 (2.93) 

1318 

1675 668.4 (3.96) 1616.0 (2.10) 

0.1% 

1472 

1874 1265.6 (2.34) 1805.7 (1.50) 

299 

709 252.7 (1.93) 

680.5 (1.12) 

0.01% 

571 

1183 748.4 (1.89) 1138.6 (2.00) 

37 

383 102.5 (1.65) 

366.7 (0.69) 
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Table 5 

Summary of results for patterns of pain 



Fisher 

Stouffer 

Mean 

280 

45 

Single 

57.04 (1.6228) 

16.44 (0.8605) 

Multiple 

280.36 (0.8105) 

77.56 (0.6616) 


excluded from further meta-analysis. In the remaining 19 studies, 11 studies 
contained DE gene lists under various p-value thresholds (marked bold in 
Supplement Table 2 [Tang et al. (2014)]) and were included in our applica¬ 
tion. In other words, this example contained exclusively only studies with 
truncated p-values. Table 5 shows the results of three imputation methods. 
Fisher and Stouffer identified 280 and 45 genes under 5% FDR control, re¬ 
spectively. Note that the original meta-analysis tested the 79 genes using 
an overall binomial test and the statistical signihcance was controlled at an 
overall p-value level, not at a gene-specific FDR level. As a result, DE gene 
lists from the new imputation methods are theoretically more powerful and 
accurate. 

To validate the finding, we used the Gene Functional Annotation tool 
from the DAVID Bioinformatics Resources website (http://david.abcc. 
ncif erf .gov). DAVID applied a modified Fisher’s exact test to evaluate the 
association between the DE gene lists and pathways. Functional annotation 
of the 280 DE genes from the Fisher’s mean imputation method identified 
208 pathways at FDR = 5%, among which selected important pain-related 
pathways were grouped into five major biological categories and displayed in 
Table 6. In contrast, the 79 genes from vote counting identihed only 14 path¬ 
ways, of which the expected pain-related pathways under the categories of 
inflammation and of differentiation, development and projection are missing 
(see Table 6). The pathway enrichment g-values after multiple comparison 
control of the “280 gene list” were very significant, while those of the “79 
gene list” were not. Since the p-value calculation from Fisher’s exact test 
can be impacted by the DE gene size, we further compared the enrichment 
odd-ratios of genes in the pathway versus in the DE gene list. Still the en¬ 
richment odds-ratios of the “280 gene list” were generally much higher than 
those for the “79 gene list,” showing stronger pain functional association 
from the Fisher’s mean imputation method. 

4.3. Application to a three-way association method (liquid association). 
So far the proposed imputation methods were applied successfully to two 
real microarray data sets of colorectal cancer and pain research in which the 
actual p-values of some genes were not reported in a subset of studies. In 





Table 6 

Summary of pathway analysis by DAVID 


Category 


Pathway ID 

280 DE 

(Fisher’s mean imputation) 

odds 

p-value q-value ratio 

79 DE 

(Vote counting) 

odds 

p-value q-value ratio 

Differentiation, 

GO : 0030182 

~ neuron differentiation 

5.6e-6 

0.0006 

3.1 

0.26 

0.95 

1.6 

development and 

GO : 0045664 

~ regulation of neuron differentiation 

1.6e-5 

0.0011 

4.7 

0.37 

0.98 

1.9 

projection 

GO ; 0048666 

neuron development 

2.5e-6 

0.0003 

3.6 

0.24 

0.94 

1.7 


GO ; 0051960 

regulation of nervous system development 

6.5e-6 

0.0006 

4.2 

0.29 

0.96 

1.9 


GO ; 0031175 

neuron projection development 

1.6e-5 

0.0012 

3.7 

0.27 

0.96 

1.8 


GO : 0042995 

~ cell projection 

3.6e-ll 

3.2e-9 

3.5 

0.033 

0.47 

1.9 


GO : 0043005 

~ neuron projection 

3.0e-ll 

3.4e-9 

4.3 

0.043 

0.51 

2.0 


GO ; 0030030 

~ cell projection organization 

1.6e-5 

0.0012 

3.3 

0.24 

0.94 

1.7 

Response 

GO : 0009611 

~ response to wounding 

3.8e-10 

2.8e-7 

4.3 

2.7e-5 

0.016 

3.6 

to stimuli 

GO ; 0009719 

response to endogenous stimulus 

3.2e-8 

1.7e-5 

3.4 

0.35 

0.97 

1.3 


GO ; 0048584 

~ positive regulation of response to stimulus 

7.9e-8 

2.5e-5 

4.9 

0.0049 

0.34 

3.6 


GO : 0032101 

~ regulation of response to external stimulus 

l.le-5 

0.001 

4.8 

0.043 

0.71 

2.8 

Immune 

GO ; 0050778 

~ positive regulation of immune response 

4.2e-7 

7.6e-5 

5.9 

0.018 

0.57 

4.0 


GO : 0002684 

~ positive regulation of immune system process 

1.9e-6 

0.0003 

4.4 

0.0009 

0.13 

4.2 


GO : 0006956 

~ complement activation 

3.0e-5 

0.0016 

11.5 

0.011 

0.46 

8.4 


GO ; 0002478 
of exogenous 

~ antigen processing and presentation 
peptide antigen 

1.3e-6 

0.00022 

19.0 

0.00098 

0.12 

10.64 

Inflammation 

GO ; 0002673 

~ regulation of acute inflammatory response 

1.4e-6 

0.0002 

14.1 

0.19 

0.93 

3.8 


GO ; 0002526 

acute inflammatory response 

7.1e-6 

0.0007 

6.7 

0.012 

0.48 

4.4 


GO : 0050727 

~ regulation of inflammatory response 

1.9e-5 

0.0012 

6.9 

0.17 

0.92 

2.8 


GO : 0006954 

~ inflammatory response 

1.5e-5 

0.0012 

4.1 

0.001 

0.11 

3.8 

Regulation of 
Transmission 

GO : 0051969 

~ regulation of transmission of nerve impulse 

6.0e-6 

0.0006 

4.8 

0.057 

0.80 

2.4 
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this section we show that the proposed imputation methods can be useful 
in the meta-analysis of “big data” such as GWAS or eQTL, where the main 
computational problem is often the data storage. 

In the literature it has long been argued that positively correlated ex¬ 
pression profiles are likely to encode functionally related proteins. Liquid 
association (LA) analysis [Li (2002)] is an advanced three-way co-expression 
analysis beyond the traditional pairwise correlations. For any triplet of genes 

X, Y and Z, the LA score LA(X,Y\Z) measures the effect that expression 
of Z to control on and off of the co-expression between X and Y. For ex¬ 
ample, high expression of Z turns on positive correlation between X and 

Y, while when expression of Z is low, X and Y are negatively or noncorre- 

lated. Theory in Li (2002) simplihed calculation of the LA score to a linear 
order of sample size and made the genome-wide computation barely feasible. 
Supposing we want to combine K studies of the liquid association, liquid 
association p-values of all triplets in all LC = 10 studies have to be stored for 
meta-analysis. When the number of genes G = 1000, the number of p-values 
to be stored is G ■ • K = 4.985 GB. For a reasonable G = 20,000 genome¬ 

wide analysis, storage size for all p-values quickly increases to 39.994 TB. 
One may argue that univariate (i.e., triplet by triplet) meta-analysis may be 
applied repeatedly to avoid the need of storing all p-value results. There are 
many other genomic meta-analysis situations when this may not be feasible. 
For example, in GWAS meta-analysis under a consortium collaboration, raw 
genotyping data cannot be shared for privacy reasons and only the derived 
statistics or p-values can be transferred for meta-analysis. Below we describe 
how imputation methods can help circumvent the tremendous data storage 
problem. 

We performed a small scale of analysis on 566 DE genes previously re¬ 
ported from the meta-analysis of the eight MDD studies used in Section 3.2 
[Wang et al. (2012)]. The total number of possible triplets {X,Y\Z) was 
90,180,780. By setting up a p-value threshold at 0.001, we only needed to 
store exact p-values for 2,094,123 (~2.32%) triplets and the remaining were 
truncated as considered in this paper. Since we also needed to store the trun¬ 
cation index information, we only needed to store 2 x 2.32% = 4.64% of the 
information and the compression ratio was 95.36%. To investigate the loss 
of information by the truncation. Figure 2 shows meta-analysis p-values [at 
— log(p) scale] from Fisher’s method using full data and the Fisher mean im¬ 
putation method using truncated data. The result shows high concordance in 
the top significant triplets, which are the major targets of this exploratory 
analysis. Among the top 1000 triplets detected by Fisher’s method using 
complete p-value information, 83.7% of them were also identified by the top 
1000 by Fisher mean imputation. The remaining 163 triplets were still in 
top ranks (rank between 1199 and 4763) using truncated data in the re¬ 
sult of Fisher mean imputation. This result suggests a good potential of 
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Fig. 2. — log(p) comparison of the mean imputation method using truncated data with 

the complete case method using complete data. Vertical line: x = 71.3. Horizontal line: 
i/ = 72.58. Points right to vertical line are top 1000 triplets detected by Fisher’s complete 
case method, and points above to horizontal line are top 1000 triplets detected by Fisher’s 
mean imputation method. 

applying data truncation to preserve the most informative information and 
performing imputation to approximate the finding of the top targets when 
meta-analysis of “big data” is needed. The compression ratio may further 
increase by a more stringent truncation threshold, but the performance may 
somewhat decline as a trade-off. 

5. Discussion and conclusion. When combining multiple genomic stud¬ 
ies by p-value combination methods, the raw data are often not available 
and only the ranges of p-values are reported for some studies in genomic ap¬ 
plications. This is especially true for microarray meta-analysis since owners 
of many microarray studies tend not to publish their data in the public do¬ 
main. This incomplete data issue is often encountered when one attempts to 
perform a large-scale microarray meta-analysis. If raw data are not available, 
two naive methods—vote counting method and available-case method—are 
commonly used. Since these two methods completely or largely neglect the 
information contained in the truncated p-values, their statistical power is 
generally low. In this paper, we proposed three imputation methods for a 
general class of evidence aggregation meta-analysis methods to combine in¬ 
dependent studies with truncated p-values: mean imputation, single random 
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imputation and multiple imputation methods. For each proposed imputa¬ 
tion method, the null distribution was derived analytically for the Fisher and 
Stouffer methods. Theoretical results showed that the test statistics from 
the single random imputation and the multiple imputation methods were 
unbiased, while those for mean imputation methods were biased. Simula¬ 
tions were performed for the imputed Fisher method and imputed Stouffer 
method. The simulation results showed that type I errors were well con¬ 
trolled for all methods, which was consistent with our theoretical derivation. 
Compared to the naive available-case method, all the imputation methods 
achieved higher statistical powers, and the mean imputation and the mul¬ 
tiple imputation methods recovered much of the power that the complete 
cases method achieved even when half of the studies had truncated p-values. 
Furthermore, Figure 3 in Appendix A showed that the power of the multiple 
imputation method was robust to the number of imputation D. Although 
small to moderate D provided good results, we recommend choosing D be¬ 
ing larger than 50 to guarantee that the central limit theorem can approxi¬ 
mate well. Applications to two motivating examples in colorectal cancer and 
pain conditions showed that both mean imputation and multiple imputation 
performed among the best in terms of detection sensitivity and biological 
validation by pathway analysis. 

In regression-type missing-data imputation methods, the null distribution 
of the error term is unknown and is assumed to be normally distributed with 
equal variance, a setting in which the multiple imputation method usually 
outperforms the mean imputation in practice and in theory [Little and Ru¬ 
bin (2002)], particularly because mean imputation underestimates the true 
variance. However, our simulation results demonstrated that the power of 
the two methods were quite similar. Two reasons may contribute to this 
result. First, although the test statistic from the mean imputation method 
is biased and neglects the variation of truncated p-values, its p-value can 
be computed accurately when the null distribution is derived analytically. 
Second and more importantly, we hnd that the test statistic of mean impu¬ 
tation is in fact F^^(E(p)), while for sufficiently large D, the test statistic 
of multiple imputation converges to E(F^^(p)) in distribution. It is easy to 
show that these two quantities are very close to each other for a small range 
of p, provided is smooth. Since is infinitely differentiable for 

the Fisher and Stouffer methods, and the small p-value range in (0,a) is 
particularly of interest to us, it is not surprising that the mean imputa¬ 
tion method and multiple imputation method perform similarly. Since the 
mean imputation method achieved almost the same power as the multiple 
imputation method with less computational complexity, it is more appeal¬ 
ing and is recommended for microarray meta-analysis, where the imputed 
meta-analysis method is performed repeatedly for thousands of genes. In this 
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paper only the evidence aggregation meta-analysis methods are investigated 
and further work will be needed to extend these results to order statistic 
based methods such as minP and maxP. 

Note that although the truncated p-value issue discussed in this paper 
may appear similar to the problem of “publication bias,” it is fundamen¬ 
tally different. Publication bias refers to the fact that a study with a large 
positive treatment effect is more likely to be published than a study with a 
relatively small treatment effect, resulting in bias if one only considers pub¬ 
lished studies. Denote hy pi,p2, ■ ■ ■ ,Pn the p-values of all conducted studies 
that should have been collected. Only a subset of likely more significant 
p-values pi,p 2 ,...,Pn are observed. Under this setting, N is unknown and 
Pn+i,... ,pn are unknown as well. Since the number of missing publica¬ 
tions is unknown, Duval and Tweedie proposed the “Trim and Fill” method 
to identify and correct for funnel plot asymmetry arising from publication 
bias [Duval and Tweedie (2000a) and (2000b)], in which an estimate of the 
number of missing studies is provided and an adjusted treatment effect is 
estimated by performing a meta-analysis including the imputed studies. For 
the truncated p-value problem we consider here, the total number of studies, 
the number of studies with truncated p-values and the p-value truncation 
thresholds are all known. Therefore, investigation of the imputation of trun¬ 
cated p-values in meta-analysis is different from the traditional “publication 
bias” problem and has not been studied in the meta-analysis literature, to 
the best of our knowledge. 

In this paper the methods we developed mainly target on microarray 
meta-analysis, but the issue can happen frequently in other types of genomic 
meta-analysis [e.g., GWAS; Begurn et al. (2012)]. In Section 4.3 we demon¬ 
strated an unconventional application of our methods to meta-analysis of 
liquid association. Due to the large number of triplets tested in the three- 
way association, the needed p-value storage is huge. By preserving only the 
most informative data by truncation, the storage burden is greatly allevi¬ 
ated and our imputation methods help approximate and recover the top 
meta-analysis targets with little power loss. In an ongoing project, we also 
attempt to combine multiple genome-wide eQTL results via meta-analysis. 
In eQTL, regression analysis is used to investigate the association of a SNP 
genotyping and a gene expression. It is impractical to store all genome-wide 
eQTL p-values, as the storage space required is too large (25,000 genes x 
2,000,000 SNPS = 5 X 10^*^ p-values). A practical solution is to record only 
the eQTL p-values smaller than a threshold (say, 10“^) for meta-analysis, 
which leads to the same statistical setting as discussed in this paper. In 
another project we combine results from multiple ChIP-seq peak calling al¬ 
gorithms to develop a meta-caller. Since each peak caller algorithm can only 
report the top peaks with p-values smaller than a certain p-value threshold, 
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we again encounter the same truncated p-value problem in meta-analysis. 
As more and more complex genomic data are generated and the need for 
meta-analysis increases, we expect the imputation methods we propose in 
this paper will find even more applications in the future. 

APPENDIX A: SUPPLEMENTARY FIGURE 

Fisher's method Stouffer's method 




Number of imputations Number of imputations 

Fig. 3. Power analysis at significance level 0.05 for different numbers of imputation D. 
The dashed lines represent the theoretical asymptotic power obtained by setting D= 1000. 
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and, therefore, 
(B.3) 


K2 K2 

T = A + bjYj + c with c = cj. 
i=i i=i 


For given t, it holds 


K2 


P(T < t) = PI A + biYi Y c<t 


2=1 


K2 


= FiA + '^biYi + c<t\Yi=ji,...,YK2=jK2] 

(B.4) xP(yi=ji,...,PA2=jA2) 

K2 / K2 \ 

Ul,-,jK2)e{0,l}^2 i=l \ i=l / 

K2 / K2 \ 

where Fa{-) is the CDF of A. 

B.2. Proof for Theorem 2. We show that for given t 
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APPENDIX C: SOME PARAMETERS IN THEOREM 3 FOR THE 

STOUFFER AND FISHER METHODS 


(C.l) 


C.l. Stouffer’s method. It is easy to obtain that 
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C.2. Fisher’s method. Similarly, it holds 
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SUPPLEMENTARY MATERIAL 

Supplementary tables (DOL 10.1214/14-AOAS747SUPP; .pdf). Supple¬ 
mentary Table 1: Detailed data sets descriptions; Supplementary Table 2: 
11 pain-relevant microarray studies. 
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